Recent algorithms in convolutional neural networks (CNN) considerably advancethe fine-grained image classification, which aims to differentiate subtledifferences among subordinate classes. However, previous studies have rarelyfocused on learning a fined-grained and structured feature representation thatis able to locate similar images at different levels of relevance, e.g.,discovering cars from the same make or the same model, both of which requirehigh precision. In this paper, we propose two main contributions to tackle thisproblem. 1) A multi-task learning framework is designed to effectively learnfine-grained feature representations by jointly optimizing both classificationand similarity constraints. 2) To model the multi-level relevance, labelstructures such as hierarchy or shared attributes are seamlessly embedded intothe framework by generalizing the triplet loss. Extensive and thoroughexperiments have been conducted on three fine-grained datasets, i.e., theStanford car, the car-333, and the food datasets, which contain eitherhierarchical labels or shared attributes. Our proposed method has achieved verycompetitive performance, i.e., among state-of-the-art classification accuracy.More importantly, it significantly outperforms previous fine-grained featurerepresentations for image retrieval at different levels of relevance.
展开▼